In 2015, studies were conducted in a mouse model of how Down syndrome affects changes in the levels of various proteins.

We have uploaded a dataset for analysis from the link

1 Dataset description

Analysis of the dataset showed that the experiment involved 72 mice.

For each mouse, it was planned to carry out 15 measurements of its state, unfortunately, not all of them were performed in full. There are 1080 measurments in dataset but only 552 are full (without missing values).

We can distinguish 8 different classes of mice.

The description of these classes can be found here:

Classes:

  • c-CS-s: control mice, stimulated to learn, injected with saline (9 mice)
  • c-CS-m: control mice, stimulated to learn, injected with memantine (10 mice)
  • c-SC-s: control mice, not stimulated to learn, injected with saline (9 mice)
  • c-SC-m: control mice, not stimulated to learn, injected with memantine (10 mice)

  • t-CS-s: trisomy mice, stimulated to learn, injected with saline (7 mice)
  • t-CS-m: trisomy mice, stimulated to learn, injected with memantine (9 mice)
  • t-SC-s: trisomy mice, not stimulated to learn, injected with saline (9 mice)
  • t-SC-m: trisomy mice, not stimulated to learn, injected with memantine (9 mice)

For each class in dataset there are n measurments:

class n
c-CS-m 150
c-CS-s 135
c-SC-m 150
c-SC-s 135
t-CS-m 135
t-CS-s 105
t-SC-m 135
t-SC-s 135

2 Differences in BDNF_N product level depending on class in experiment

We have built boxplots to visually represent differences in BDNF_N protein expression in different classes.

Next, we applied one-way ANOVA to pinpoint the existence of differences.

However, first we checked the limits of its applicability.

We used Bartlett test to check dispersion homogentity and found dispersion heteroskedasticity with p_value = 1.381709410^{-10}.

However given that the number of observations in groups is close to the same, we can apply one-way ANOVA and Tukey’s test in order to establish differences in groups

As we can see p-value is less then 0.05 so there are significant differences in mean of groups. We applied the Tukey criterion in order to find out between which groups there are differences.

The figure shows the differences between the group averages (Differences in mean levels of class) and their confidence intervals, calculated taking into account the control over the group error probability (95% family-wise confidence level). In 13 cases, the confidence intervals include 0, indicating no difference between the respective groups. We can conclude that there are significant differences between the other groups.

3 Linear model

We built a multiple linear model, in which ERBB4_N protein level acted as a dependent variable, and all other protein levels as independent predictors

## 
## Call:
## lm(formula = ERBB4_N ~ . - pS6_N, data = df_protein_sub)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023037 -0.003370 -0.000127  0.003071  0.031955 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.566e-02  8.245e-03   3.112 0.001970 ** 
## DYRK1A_N        -6.824e-03  7.475e-03  -0.913 0.361710    
## ITSN1_N          8.127e-03  1.058e-02   0.768 0.442776    
## BDNF_N           3.833e-02  1.919e-02   1.997 0.046378 *  
## NR1_N           -1.409e-02  6.685e-03  -2.108 0.035544 *  
## NR2A_N          -1.049e-04  1.361e-03  -0.077 0.938610    
## pAKT_N           7.961e-02  3.044e-02   2.615 0.009203 ** 
## pBRAF_N         -4.103e-02  3.065e-02  -1.339 0.181356    
## pCAMKII_N       -1.451e-05  7.954e-04  -0.018 0.985451    
## pCREB_N         -6.015e-02  2.910e-02  -2.067 0.039290 *  
## pELK_N           4.731e-05  1.030e-03   0.046 0.963391    
## pERK_N          -1.987e-02  7.887e-03  -2.519 0.012109 *  
## pJNK_N          -6.035e-02  2.538e-02  -2.378 0.017814 *  
## PKCA_N           8.377e-03  2.736e-02   0.306 0.759625    
## pMEK_N           8.498e-03  2.706e-02   0.314 0.753676    
## pNR1_N          -2.566e-02  1.334e-02  -1.924 0.054968 .  
## pNR2A_N          1.004e-02  7.239e-03   1.388 0.165927    
## pNR2B_N          1.928e-02  6.659e-03   2.896 0.003959 ** 
## pPKCAB_N         5.727e-03  2.773e-03   2.065 0.039457 *  
## pRSK_N           1.001e-02  1.427e-02   0.701 0.483573    
## AKT_N           -6.738e-04  8.183e-03  -0.082 0.934413    
## BRAF_N           1.550e-02  1.213e-02   1.278 0.201926    
## CAMKII_N        -3.429e-03  2.011e-02  -0.170 0.864700    
## CREB_N          -1.450e-02  3.091e-02  -0.469 0.639135    
## ELK_N            3.510e-03  3.719e-03   0.944 0.345811    
## ERK_N            4.660e-03  2.804e-03   1.662 0.097208 .  
## GSK3B_N         -5.840e-03  6.824e-03  -0.856 0.392588    
## JNK_N           -1.188e-02  3.378e-02  -0.352 0.725317    
## MEK_N            8.785e-03  2.197e-02   0.400 0.689510    
## TRKA_N           5.722e-03  1.646e-02   0.348 0.728257    
## RSK_N           -2.299e-02  3.915e-02  -0.587 0.557214    
## APP_N           -5.345e-03  1.825e-02  -0.293 0.769751    
## Bcatenin_N       1.107e-03  5.298e-03   0.209 0.834546    
## SOD1_N          -6.241e-03  3.985e-03  -1.566 0.118018    
## MTOR_N           4.161e-02  1.651e-02   2.520 0.012048 *  
## P38_N           -1.619e-02  1.348e-02  -1.201 0.230267    
## pMTOR_N         -9.530e-03  7.667e-03  -1.243 0.214475    
## DSCR1_N         -6.849e-03  1.023e-02  -0.670 0.503374    
## AMPKA_N          2.522e-02  2.294e-02   1.099 0.272263    
## NR2B_N           7.391e-03  9.146e-03   0.808 0.419449    
## pNUMB_N         -6.467e-03  2.006e-02  -0.322 0.747326    
## RAPTOR_N         4.750e-02  2.465e-02   1.927 0.054629 .  
## TIAM1_N         -3.284e-02  1.956e-02  -1.679 0.093892 .  
## pP70S6_N         2.739e-03  5.307e-03   0.516 0.606022    
## NUMB_N          -3.304e-02  3.784e-02  -0.873 0.383042    
## P70S6_N         -4.357e-03  5.119e-03  -0.851 0.395118    
## pGSK3B_N         1.291e-01  4.063e-02   3.179 0.001577 ** 
## pPKCG_N         -7.636e-03  1.962e-03  -3.893 0.000113 ***
## CDK5_N           7.111e-04  9.807e-03   0.073 0.942225    
## S6_N             1.566e-02  7.647e-03   2.049 0.041059 *  
## ADARB1_N        -2.609e-03  2.040e-03  -1.279 0.201557    
## AcetylH3K9_N     2.113e-03  1.103e-02   0.192 0.848199    
## RRP1_N          -1.488e-02  1.038e-02  -1.434 0.152232    
## BAX_N           -1.304e-01  3.542e-02  -3.680 0.000260 ***
## ARC_N            1.687e-01  6.748e-02   2.499 0.012775 *  
## nNOS_N          -4.749e-03  2.854e-02  -0.166 0.867918    
## Tau_N            7.082e-02  1.810e-02   3.913 0.000104 ***
## GFAP_N          -4.703e-02  5.400e-02  -0.871 0.384291    
## GluR3_N         -7.133e-03  2.436e-02  -0.293 0.769812    
## GluR4_N         -7.030e-02  3.373e-02  -2.084 0.037672 *  
## IL1B_N           2.845e-02  1.081e-02   2.631 0.008794 ** 
## P3525_N          4.994e-02  2.423e-02   2.061 0.039810 *  
## pCASP9_N         8.118e-03  3.044e-03   2.667 0.007923 ** 
## PSD95_N          2.379e-02  3.811e-03   6.242 9.55e-10 ***
## SNCA_N          -1.137e-02  3.484e-02  -0.326 0.744195    
## Ubiquitin_N      2.330e-03  4.883e-03   0.477 0.633468    
## pGSK3B_Tyr216_N  2.808e-02  1.006e-02   2.793 0.005439 ** 
## SHH_N            4.904e-02  1.998e-02   2.454 0.014471 *  
## BAD_N           -3.592e-02  3.508e-02  -1.024 0.306311    
## BCL2_N           6.173e-03  3.112e-02   0.198 0.842840    
## pCFOS_N         -1.232e-02  3.111e-02  -0.396 0.692240    
## SYP_N            1.354e-02  1.079e-02   1.254 0.210395    
## H3AcK18_N        2.401e-03  2.527e-02   0.095 0.924349    
## EGR1_N          -6.016e-03  2.619e-02  -0.230 0.818406    
## H3MeK4_N         1.227e-02  2.154e-02   0.570 0.569097    
## CaNA_N          -1.749e-03  3.419e-03  -0.512 0.609159    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.005885 on 476 degrees of freedom
## Multiple R-squared:  0.8641, Adjusted R-squared:  0.8427 
## F-statistic: 40.37 on 75 and 476 DF,  p-value: < 2.2e-16

4 Model diagnostics

The first graph shows that the residues are unevenly distributed. We can see dispersion heteroskedasticity. Also here we can notice nonlinearity of relationship as not all observations are in the +/- 2 standard deviation zone. In addition, the residuals are unevenly distributed, we conclude that the variables are not independent.

Cook’s distance graph shows that there is no influential observations.

Distribution graf shows that residual’s distribution is slightly differs from normal .

To assess the presence of multicollinearity in our model, we built a correlation matrix, but visualized only significant interactions (significant level = 0.8).

As we can see, there are quite a few interactions between dependent variables.

Based on all the diagnostics of the model, we can conclude that the use of a linear model in this case is not an advantageous solution.

We can try to optimize this model.

5 Improving model quality

After removing 12 predictors from the model, with the largest VIF, multicollinearity is still observed in the model.

##  DYRK1A_N    BDNF_N    NR2A_N    pAKT_N   pBRAF_N pCAMKII_N   pCREB_N    pELK_N 
## 18.761162 15.191238 20.760271 19.833740  9.486633 12.983859 13.242561  2.250339 
##    pJNK_N    pMEK_N    pNR1_N   pNR2A_N  pPKCAB_N    pRSK_N     AKT_N    BRAF_N 
## 22.297602 21.584814 23.373532 21.317910 19.454005 13.473465 14.546179 17.881495

Only 16 VIFs are showm.

Thus, it makes no sense to further try to optimize the model. Better to use another method.

6 PCA creation

We performed principal component analisys of mice dataframe to reduce a number of dimensions.

The graph shows the contribution of the components to the total variability. For further analysis, we will leave only the components whose contribution is greater than in Broken Stick Model.

6.1 Ordination

After that, we built the ordination in the axes of the first two principal components

Unfortunately, grouping each class separately does not give clear visualization. Therefore, I tried to build an ordination with division into two types of mice (stimulated / non stimulated).

We can see that now the groups are more pronounced for stimulated and non stimulated mice. However there are almost no difference between Control and Down syndrom mice and stimulation with both saline and memantine affects them equally.

6.2 Correlation biplot

We can conclude that there are a lot of proteins correlated with each other. Probably we should remove some of them from analisys.

6.3 3D plots for the first 3 components

As we see see points do not form any logical clusters.

We can try to plot graphs by grouping points in a different way.

We tried to plot graphs for trisomy / non-trisomy groups and stimulated / non-stimulated mice.

Only on the graph showing the groups of stimulated / not stimulated mice, we can see some differences in the groups. (As at 2D PCA plot)

6.4 Percentage of variability attributable to each component

Since the contribution of the last components is too small, only the first 15 are displayed

PC_N percent
PC1 29.9334664
PC2 17.3490002
PC3 9.8525452
PC4 9.3007018
PC5 4.2628791
PC6 3.9510904
PC7 3.0744486
PC8 2.9289286
PC9 2.3776033
PC10 1.6606732
PC11 1.4437148
PC12 1.2767892
PC13 1.1207064
PC14 0.9102884
PC15 0.7937266

7 Search for differential proteins using DESeq2

We’ve count differences using DESeq2 for all possible conditions (together and separatly).

## log2 fold change (MLE): condition t.SC.s vs c.CS.m 
## Wald test p-value: condition t.SC.s vs c.CS.m 
## DataFrame with 6 rows and 6 columns
##           baseMean log2FoldChange     lfcSE      stat      pvalue        padj
##          <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
## SOD1_N     52.1509       1.167145 0.0640616  18.21910 3.64066e-74 2.80331e-72
## pERK_N     52.2565      -0.891397 0.0600636 -14.84089 7.96959e-50 3.06829e-48
## BRAF_N     34.9634      -0.671304 0.0540479 -12.42052 2.02242e-35 5.19088e-34
## GSK3B_N   116.9952      -0.316110 0.0266573 -11.85827 1.94957e-32 3.75292e-31
## DYRK1A_N   40.9928      -0.512317 0.0577757  -8.86734 7.49153e-19 1.15370e-17
## pPKCAB_N  160.0989      -0.386464 0.0485008  -7.96819 1.61018e-15 2.06640e-14

Here we can see DESeq2 calculated list of differrentially expressed proteins sorted by p_value (adjusted) taking into account all the classes at the same time.

Also we’ve prepared a gene list includes only reliably altered proteins. (it can be used in subsequent research )

In the table we can see the most changede genes per each condition. They can be used for Genquery or MSigDB.

Trisomy/Control Saline/Memantine Stimulated/Non_stimulated All
Tau_N pCAMKII_N SOD1_N SOD1_N
ITSN1_N BRAF_N pERK_N pERK_N
S6_N ERK_N CaNA_N BRAF_N
AcetylH3K9_N DYRK1A_N GSK3B_N GSK3B_N
APP_N pERK_N pPKCAB_N DYRK1A_N
DYRK1A_N NUMB_N DYRK1A_N pPKCAB_N
pNR1_N ELK_N BRAF_N NR1_N
P38_N AKT_N ITSN1_N Tau_N
MTOR_N CDK5_N Ubiquitin_N CaNA_N
BRAF_N SYP_N P38_N ITSN1_N
GluR3_N P38_N pMTOR_N pNR1_N
pNR2A_N pMTOR_N AKT_N Ubiquitin_N
pMTOR_N DSCR1_N S6_N pELK_N
NR1_N ITSN1_N IL1B_N NR2A_N
NR2B_N IL1B_N pCAMKII_N pNR2B_N
AMPKA_N CaNA_N pNR2A_N H3MeK4_N
pNR2B_N H3AcK18_N NR2B_N ADARB1_N
H3AcK18_N pPKCAB_N pNUMB_N H3AcK18_N
pPKCG_N ADARB1_N PKCA_N GluR3_N
pERK_N SOD1_N H3MeK4_N EGR1_N
PSD95_N pGSK3B_N ADARB1_N TRKA_N
NR2A_N NR1_N EGR1_N TIAM1_N
AKT_N BAX_N PSD95_N AcetylH3K9_N
IL1B_N Ubiquitin_N MTOR_N BCL2_N
pCASP9_N pNR2A_N pAKT_N BAD_N
SYP_N pP70S6_N SNCA_N pP70S6_N
pP70S6_N Bcatenin_N pMEK_N pCASP9_N
EGR1_N PSD95_N NUMB_N RRP1_N
pCFOS_N pRSK_N ERK_N PKCA_N
SNCA_N NR2B_N pGSK3B_Tyr216_N pCREB_N

We visualized the differences in expression in several proteins with the lowest p adjusted to estimate the effect of different influence at mice.